Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections
نویسندگان
چکیده
منابع مشابه
Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections
Compression techniques that support fast random access are a core component of any information system. Current stateof-the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive algorithm such as gzip. Random access to a specific document then requires decompression of a block. The choice of block size is critical: it trades between compressi...
متن کاملRelative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval
Self-indexes – data structures that simultaneously provide fast search of and access to compressed text – are promising for genomic data but in their usual form are not able to exploit the high level of replication present in a collection of related genomes. Our ‘RLZ’ approach is to store a self-index for a base sequence and then compress every other sequence as an LZ77 encoding relative to the...
متن کاملLempel-Ziv Dimension for Lempel-Ziv Compression
This paper describes the Lempel-Ziv dimension (Hausdorff like dimension inspired in the LZ78 parsing), its fundamental properties and relation with Hausdorff dimension. It is shown that in the case of individual infinite sequences, the Lempel-Ziv dimension matches with the asymptotical Lempel-Ziv compression ratio. This fact is used to describe results on Lempel-Ziv compression in terms of dime...
متن کاملComputing Reversed Lempel-Ziv Factorization Online
Kolpakov and Kucherov proposed a variant of the Lempel-Ziv factorization, called the reversed Lempel-Ziv (RLZ) factorization (Theoretical Computer Science, 410(51):5365–5373, 2009). In this paper, we present an on-line algorithm that computes the RLZ factorization of a given string w of length n in O(n log n) time using O(n log σ) bits of space, where σ ≤ n is the alphabet size. Also, we introd...
متن کاملLempel-Ziv factorization: Simple, fast, practical
For decades the Lempel-Ziv (LZ77) factorization has been a cornerstone of data compression and string processing algorithms, and uses for it are still being uncovered. For example, LZ77 is central to several recent text indexing data structures designed to search highly repetitive collections. However, in many applications computation of the factorization remains a bottleneck in practice. In th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2011
ISSN: 2150-8097
DOI: 10.14778/2078331.2078341